Amplitude Modulation Maps for Robust Speech Recognition
نویسنده
چکیده
Two recognition tasks are discussed in which pre-processing based on amplitude modulation (AM) maps is compared with other feature extraction strategies. In the first task we show how the AM map representation can be used to segregate voiced speech signals from one another. The second shows how the AM representation can be used for robust digit recognition in additive noise. Natural vowels from the TIMIT database are presented concurrently with a second vowel and recognised using a multilayer perceptron. AM map based pre-processing is compared with that of Parsonsí harmonic selection algorithm and a strategy using no noise reduction. The proposed feature extraction algorithm leads to an improvement in recognition equivalent to a 6 dB increase in signal-to-noise ratio (SNR) over the other algorithms. Digits (from OGI Alphadigits) were presented in clean, in white noise and in rapidly varying high-pass/low-pass noise conditions. Recognition performance, based on an 8 state left-toright hidden Markov model (HMM), is compared for conventional mel-scale cepstral coefficients (MFCCs), auditory filterbank output, and the spectra recovered from AM maps. For clean speech we obtain error rates of 6-8% for all three strategies but as the noise level increases recognition scores consistently show AM maps to be the more robust strategy.
منابع مشابه
Robust energy demodulation based on continuous models with application to speech recognition
In this paper, we develop improved schemes for simultaneous speech interpolation and demodulation based on continuous-time models. This leads to robust algorithms to estimate the instantaneous amplitudes and frequencies of the speech resonances and extract novel acoustic features for ASR. The continous-time models retain the excellent time resolution of the ESAs based on discrete energy operato...
متن کاملAmplitude Modulation Filters as Feature Sets for Robust ASR: Constant Absolute or Relative Bandwidth?
Many research efforts in the field of feature extraction for automatic speech recognition are focused on analyzing slow amplitude fluctuations of speech. In this study the importance of spectral and temporal resolution for the amplitude modulation frequency analysis are investigated in order to provide guidance for the appropriate filter design. Therefore, different wavelet and Fourier transfor...
متن کاملSmoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition
In this paper we present a robust feature extractor that includes the use of a smoothed nonlinear energy operator (SNEO)-based amplitude modulation features for a large vocabulary continuous speech recognition (LVCSR) task. SNEO estimates the energy required to produce the AM-FM signal, and then the estimated energy is separated into its amplitude and frequency components using an energy separa...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کامل